Packages

library(ggplot2) # grammer of graphics
library(palmerpenguins) # Penguins data
library(dplyr) # data wrangling
library(tidyr) # tidy thre data 
library(patchwork) # multiple plot alignment
library(plotly) # interactive plot
library(ggiraph) # interactive plot

Graphs

Essential part of data analyses. Data with same summary statistics can look very different when plotted out. Anscombe’s quartet, Datasaurus

Notes: Graphing is an essential part of data analyses. Summary statistics do not always reflect how the data looks like.

Anscombe’s quartet comprises four data sets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed.

A more modern example dataset is datasaurus dozen. A set of 13 xy datasets that have nearly identical summary statistics but look very different when plotted out. One of the plots is in fact a dinosaur.

Base Graphics vs ggplot2

hist(penguins$flipper_length_mm)

ggplot(penguins,aes(flipper_length_mm))+
  geom_histogram(bins = 13)
## Warning: Removed 2 rows containing non-finite values (`stat_bin()`).

warning message? Its ´OK´ to have the following message throughout the commands ## Warning: Removed 2 rows containing missing values. It is because penguins data have 2 rows that have no values in its cell or say empty cell.

Notes: For simple graphs, the base plot seems to take minimal coding effort compared to a ggplot graph.

Base Graphics vs ggplot2

{style=“background-color:#d0ece7”} basic r plot

plot(penguins$flipper_length_mm,penguins$body_mass_g,
     col=c("red","green","blue")[penguins$species],
     pch=c(0,1,2)[penguins$species])
legend(x=172,y=6300,
       legend=c("Adelie","Chinstrap","Gentoo"),
       pch=c(0,1,2),col=c("red","green","blue"))

ggplot(penguins, aes(flipper_length_mm,body_mass_g, 
  color=species)) +
  geom_point()
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Notes: For anything beyond extremely basic plots, base plotting quickly become complex. More importantly, base plots do not have consistency in it’s functions or plotting strategy.

Why ggplot2?

Notes: ggplot2 has a consistent logic and more structured code for plotting. There is bit of a learning curve, but once the code syntax and the logic is clear, it becomes easy to plot a huge variety of graphs.

Grammar Of Graphics

Notes: Traditional graphing tools generally have independent set of rules for different kinds of graphs and also labelled differently such as barplots, scatterplots, boxplots etc. Each graph has it’s own function and plotting strategy.

Leland Wilkinson’s The Grammar of Graphics introduces this idea that any kind of graph can be created by following a set of rules and put forward a framework that enables this.

Grammar of graphics (GOG) tries to unify all graphs under a common umbrella. GOG brings the idea that graphs are made up of discrete elements (data, aesthetics, geometry, statistics, coordinates, facets, themes etc) which can be mixed and matched to create any plot. This creates a consistent underlying framework to graphing.

ggplot (Grammar of graphics) was built in R by Hadley Wickham in 2005 as an implementation of Leland Wilkinson’s book Grammar of Graphics.

Building A Graph

# library(palmerpenguins) # ?penguins

ggplot(data = penguins)

Building A Graph

# library(palmerpenguins) # ?penguins

ggplot(data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g))

Building A Graph

# library(palmerpenguins) # ?penguins

ggplot(data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g)) + 
geom_point()
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Building A Graph

# library(palmerpenguins) # ?penguins

ggplot(data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g, 
colour = species)) +
geom_point()

Building A Graph

# library(palmerpenguins) # ?penguins

ggplot(data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g, 
colour = species)) +
geom_point()

Or

# library(palmerpenguins) # ?penguins

ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, 
colour = species))
# library(palmerpenguins) # ?penguins

ggplot(data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g, 
colour = species)) +
geom_point()
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Building A Graph

# library(palmerpenguins) # ?penguins

ggplot(data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g, 
colour = species)) +
geom_point()

Or

# library(palmerpenguins) # ?penguins

ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g, 
colour = species))

Data • penguins

## # A tibble: 3 × 8
##   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
## 1 Adelie  Torgersen           39.1          18.7               181        3750
## 2 Adelie  Torgersen           39.5          17.4               186        3800
## 3 Adelie  Torgersen           40.3          18                 195        3250
## # ℹ 2 more variables: sex <fct>, year <int>
glimpse(penguins) # or str(penguins)
## Rows: 344
## Columns: 8
## $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex               <fct> male, female, female, NA, female, male, female, male…
## $ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

Notes: It’s a good idea to use str() to check the input dataframe to make sure that numbers are actually numbers and not characters, for example. Verify that factors are correctly assigned.

Data • format

Wide

head(penguins, n=4)
# A tibble: 4 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
# ℹ 2 more variables: sex <fct>, year <int>

Long

penguins %>% tidyr::pivot_longer(
  col=c(bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g), 
  names_to= "variables", values_to="value",values_drop_na = TRUE) %>%
  as.data.frame() %>% head(n=4)
  species    island  sex year         variables  value
1  Adelie Torgersen male 2007    bill_length_mm   39.1
2  Adelie Torgersen male 2007     bill_depth_mm   18.7
3  Adelie Torgersen male 2007 flipper_length_mm  181.0
4  Adelie Torgersen male 2007       body_mass_g 3750.0

Notes: The data must be cleaned up and prepared for plotting. The data must be ‘tidy’. Columns must be variables and rows must be observations. The data can then be in wide or long format depending on the variables to be plotted.

Geoms • types

p <- ggplot(data = penguins)

# scatterplot
p + geom_point(aes(x=flipper_length_mm,y=body_mass_g))

# barplot
p + geom_bar(aes(x=species))

# boxplot
p + geom_boxplot(aes(x=species,y=body_mass_g))

# search
help.search("^geom_",package="ggplot2")

any object name and combine plots

p <- ggplot(penguins)

# scatterplot
scatterplot <- p + geom_point(aes(x=flipper_length_mm,y=body_mass_g))

# barplot
barplot <- p + geom_bar(aes(x=flipper_length_mm))

# boxplot
boxplot <- p + geom_boxplot(aes(x=species,y=body_mass_g))

wrap_plots(scatterplot, barplot, boxplot, ncol = 1)
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing non-finite values (`stat_count()`).
## Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).

Notes: Geoms are the geometric components of a graph such as points, lines etc used to represent data. The same data can be visually represented in different geoms. For example, points or bars. Mandatory input requirements change depending on geoms.

Stats

x <- ggplot(data = penguins) + 
  geom_bar(aes(x=flipper_length_mm),stat="bin")

y <- ggplot(data = penguins) + 
  geom_bar(aes(x=species),stat="count")

z <- ggplot(data = penguins) + 
  geom_bar(aes(x=species,y=flipper_length_mm),stat="identity")

wrap_plots(x,y,z,nrow=1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing non-finite values (`stat_bin()`).
## Warning: Removed 2 rows containing missing values (`position_stack()`).

x <- ggplot(data = penguins) + 
  stat_bin(aes(x=flipper_length_mm),geom="bar")

y <- ggplot(data = penguins) + 
  stat_count(aes(x=species),geom="bar")

z <- ggplot(data = penguins) + 
  stat_identity(aes(x=species,y=flipper_length_mm),geom="bar")

wrap_plots(x,y,z,nrow=1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing non-finite values (`stat_bin()`).
## Warning: Removed 2 rows containing missing values (`geom_bar()`).

Notes: - Normally the data is plotted directly from input as it is. - Some plots require the data to be computed or transformed. Eg. boxplot, histograms, smoothing, predictions, regression etc.

Stats

data.frame("plot"=c("histogram","smooth","boxplot","density","freqpoly"),
           "stat"=c("bin","smooth","boxplot","density","freqpoly"),
           "geom"=c("bar","line","boxplot","line","line")) 
##        plot     stat    geom
## 1 histogram      bin     bar
## 2    smooth   smooth    line
## 3   boxplot  boxplot boxplot
## 4   density  density    line
## 5  freqpoly freqpoly    line
stat_bin()
stat_count()
stat_density()
stat_bin_2d()
stat_bin_hex()
stat_contour()
stat_boxplot()
stat_smooth()
stat_quantile()

Use args(geom_bar) to check arguments.

Position

p <- ggplot(penguins,aes(x=year,y=body_mass_g,fill=species))
p + geom_bar(stat="identity",
             position="stack")
## Warning: Removed 2 rows containing missing values (`position_stack()`).

p + geom_bar(stat="identity",
             position="dodge")
## Warning: Removed 2 rows containing missing values (`geom_bar()`).

p + geom_bar(stat="identity",
             position="fill")
## Warning: Removed 2 rows containing missing values (`position_stack()`).

Aesthetics

ggplot(data = penguins)+
  geom_point(aes(x=flipper_length_mm,
                 y=body_mass_g,
                 size=bill_length_mm,
                 alpha=bill_depth_mm,
                 shape=species,
                 color=species))
## Warning: Removed 2 rows containing missing values (`geom_point()`).

ggplot(data = penguins)+
  geom_point(aes(x=flipper_length_mm,
                 y=body_mass_g),
                 size=2,
                 alpha=0.8,
                 shape=15,
                 color="steelblue")
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Notes: Aesthetics are used to assign values to geometries. For example, a set of points can be a fixed size or can be different colors or sizes denoting a variable.

This would be an incorrect way to do it.

ggplot(penguins)+
geom_point(aes(x=flipper_length_mm,y=body_mass_g,size=2))
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Multiple Geoms

p <- ggplot(penguins,aes(x=flipper_length_mm,y=body_mass_g))+
      geom_point()






p
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Notes: Multiple geoms can be plotted one after the other. The order in which items are specified in the command dictates the plotting order on the actual plot.

Multiple Geoms

p <- ggplot(penguins,aes(x=flipper_length_mm,y=body_mass_g))+
      geom_point()+
      geom_line()





p
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_line()`).

Notes: Multiple geoms can be plotted one after the other. The order in which items are specified in the command dictates the plotting order on the actual plot.

Multiple Geoms

p <- ggplot(penguins,aes(x=flipper_length_mm,y=body_mass_g))+
      geom_point()+
      geom_line()+
      geom_smooth()




p
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_line()`).

Notes: Multiple geoms can be plotted one after the other. The order in which items are specified in the command dictates the plotting order on the actual plot.

Multiple Geoms

p <- ggplot(penguins,aes(x=flipper_length_mm,y=body_mass_g))+
      geom_point()+
      geom_line()+
      geom_smooth()+
      geom_rug()



p
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_line()`).

Notes: Multiple geoms can be plotted one after the other. The order in which items are specified in the command dictates the plotting order on the actual plot.

Multiple Geoms

p <- ggplot(penguins,aes(x=flipper_length_mm,y=body_mass_g))+
      geom_point()+
      geom_line()+
      geom_smooth()+
      geom_rug()+
      geom_step()


p
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_line()`).

Notes: Multiple geoms can be plotted one after the other. The order in which items are specified in the command dictates the plotting order on the actual plot.

Multiple Geoms

p <- ggplot(penguins,aes(x=flipper_length_mm,y=body_mass_g))+
      geom_point()+
      geom_line()+
      geom_smooth()+
      geom_rug()+
      geom_step()+
      geom_text(data=subset(penguins,penguins$species=="Adelie"),
                aes(label=species))
p
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_text()`).

Notes: Multiple geoms can be plotted one after the other. The order in which items are specified in the command dictates the plotting order on the actual plot.

In this case, the points appear over the lines.

ggplot(penguins,aes(x=flipper_length_mm,y=body_mass_g))+
      geom_point()+
      geom_line()
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`geom_line()`).

while here the lines appear above the points.

ggplot(penguins,aes(x=flipper_length_mm,y=body_mass_g))+
      geom_line()+
      geom_point()
## Warning: Removed 2 rows containing missing values (`geom_line()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Each geom takes input from ggplot() inputs. If extra input is required to a geom, it can be specified additionally inside aes().

data can be changed if needed for specific geoms.

Just because you can doesn’t mean you should!

Facets • facet_wrap

p <- ggplot(penguins)+
      geom_point(aes(x=flipper_length_mm,
                     y=body_mass_g,
                     color=species))
p
## Warning: Removed 2 rows containing missing values (`geom_point()`).

p + facet_wrap(~species)
## Warning: Removed 2 rows containing missing values (`geom_point()`).

p + facet_wrap(~species,nrow=3)
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Notes: facet_wrap is used to split a plot into subplots based on the categories in one or more variables.

Facets • facet_grid

p <- ggplot(data = penguins, aes(flipper_length_mm,body_mass_g))+
     geom_point()
p + facet_grid(~island+year)
## Warning: Removed 2 rows containing missing values (`geom_point()`).

p + facet_grid(island~year)
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Notes: facet_grid is also used to split a plot into subplots based on the categories in one or more variables. facet_grid can be used to create a matrix-like grid of two variables.

Coordinate Systems

p <- ggplot(penguins,aes(x="",y=bill_length_mm,
            fill=species))+
  geom_bar(stat="identity")
p
## Warning: Removed 2 rows containing missing values (`position_stack()`).

p + coord_polar("y", start = 0)
## Warning: Removed 2 rows containing missing values (`position_stack()`).

Notes: The coordinate system defines the surface used to represent numbers. Most plots use the cartesian coordinate sytem. Pie charts for example, is a polar coordinate projection of a cartesian barplot. Maps for example can have numerous coordinate systems called map projections. For example; UTM coordinates.

Theming

ggplot(penguins, aes(bill_length_mm)) +
    geom_histogram() +
    facet_wrap(~species, ncol = 1) +
    theme_grey()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing non-finite values (`stat_bin()`).

ggplot(penguins, aes(bill_length_mm)) +
    geom_histogram() +
    facet_wrap(~species, ncol = 1) +
    theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing non-finite values (`stat_bin()`).

Notes: Themes allow to modify all non-data related components of the plot. This is the visual appearance of the plot. Examples include the axes line thickness, the background color or font family.

Theme • Legend

p <- ggplot(penguins)+
      geom_point(aes(x=flipper_length_mm, 
                     y=body_mass_g, 
                     color=species))

at top

p + theme(legend.position="top")
## Warning: Removed 2 rows containing missing values (`geom_point()`).

at bottom

p + theme(legend.position="bottom")
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Theme • Text

element_text(family=NULL,face=NULL,color=NULL,size=NULL,hjust=NULL,
             vjust=NULL, angle=NULL,lineheight=NULL,margin = NULL)
p <- ggplot(penguins, aes(x = flipper_length_mm,
                          y = body_mass_g, 
                          alpha = bill_length_mm,
                          shape = island)) +
  geom_point() +
  facet_grid(island~year) +
  labs(title="Title",
       subtitle="subtitle")
p
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Theme • Text

element_text(family=NULL,face=NULL,color=NULL,size=NULL,hjust=NULL,
             vjust=NULL, angle=NULL,lineheight=NULL,margin = NULL)
p <- p + 
  theme(axis.title=element_text(color="#e41a1c"),
    axis.text=element_text(color="#377eb8"),
    plot.title=element_text(color="#4daf4a"),
    plot.subtitle=element_text(color="#984ea3"),
    legend.text=element_text(color="#ff7f00"),
    legend.title=element_text(color="#ffff33"),
    strip.text=element_text(color="#a65628"))
p
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Saving plots

p <- ggplot(penguins,aes(bill_length_mm,flipper_length_mm,color=species)) + 
  geom_point()
p
## Warning: Removed 2 rows containing missing values (`geom_point()`).

p <- ggplot(penguins,aes(bill_length_mm,flipper_length_mm,color=species)) + 
  geom_point()
p
## Warning: Removed 2 rows containing missing values (`geom_point()`).

ggsave("plot.png",p,height=5,width=7,units="cm",dpi=200)
# Note that default units in png is pixels while in ggsave it’s inches
png("plot.png",height=5,width=7,units="cm",res=200)
print(p)
dev.off()

Combining Plots

p <- ggplot(penguins, aes(flipper_length_mm, body_mass_g,color=species)) + geom_point()
q <- ggplot(penguins, aes(year, body_mass_g, fill=species)) + geom_bar(stat="identity")
patchwork::wrap_plots(p,q)
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`position_stack()`).

patchwork::wrap_plots(p,q) + 
  plot_annotation(tag_levels = 'a') & theme(plot.tag = element_text(size = 14))
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing missing values (`position_stack()`).

[Refer to patchwork documentation.]

Notes: Combining two or more ggplot2 plots is often required and several packages exist to help with this situation. Some functions allow plots to be placed adjacently, also allowing varying heights or widths of each plot. Some functions allow one plot to be plotted on another plot like a subset plot. Here are alternative options.

gridExtra::grid.arrange(p,q,ncol=2)
ggpubr::ggarrange(p,q,ncol=2,
  widths=c(1.5,1),common.legend=T)
cowplot::plot_grid()

Extensions

A collection of ggplot extension packages: https://exts.ggplot2.tidyverse.org/.
Curated list of ggplot2 links: https://github.com/erikgahner/awesome-ggplot2.

Help

Acknowledgements:

SLUBI3Bs • Slides adapted from RaukR

Extra

Data • diamonds

## # A tibble: 3 × 10
##   carat cut     color clarity depth table price     x     y     z
##   <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal   E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.21 Premium E     SI1      59.8    61   326  3.89  3.84  2.31
## 3  0.23 Good    E     VS1      56.9    65   327  4.05  4.07  2.31
str(diamonds)
## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
##  $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

Notes: R data.frame is a tabular format with rows and columns just like a spreadsheet.

Aesthetics

x1 <- ggplot(data = penguins) +
  geom_point(aes(x=flipper_length_mm,y=body_mass_g)) +
  stat_smooth(aes(x=flipper_length_mm,y=body_mass_g))

x2 <- ggplot(data = penguins, aes(x=flipper_length_mm,y=body_mass_g)) +
  geom_point() + 
  geom_smooth()

wrap_plots(x1,x2,nrow=1,ncol=2)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
## Removed 2 rows containing missing values (`geom_point()`).

Notes: If the same aesthetics are used in multiple geoms, they can be moved to ggplot(). x1|x2 # if install patchwalk

Scales • Discrete Colors

p <- ggplot(penguins)+
  geom_point(aes(x=flipper_length_mm,
    y=body_mass_g,color=species))
p
## Warning: Removed 2 rows containing missing values (`geom_point()`).

p + scale_color_manual(
     name="Manual",
     values=c("#5BC0EB","#FDE74C","#9BC53D"))
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Notes: Scales are used to control the aesthetics. For example the aesthetic color is mapped to a variable x. The palette of colors used, the mapping of which color to which value, the upper and lower limit of the data and colors etc is controlled by scales.

Scales • Continuous Colors

p <- ggplot(penguins)+
      geom_point(aes(x=flipper_length_mm,
                     y=body_mass_g,
      shape=species,color=bill_length_mm))
p
## Warning: Removed 2 rows containing missing values (`geom_point()`).

p +
scale_color_gradient(name="Bill Len",
  breaks=range(penguins$bill_length_mm, na.rm =T),
  labels=c("Min","Max"),
  low="black",high="red")
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Notes: Continuous colours can be changed using scale_color_gradient() for two colour gradient. Any number of breaks and colours can be specified using scale_color_gradientn().

Scales • Shape

p <- ggplot(penguins) +
      geom_point(aes(x=flipper_length_mm,
                     y=body_mass_g,
      shape=species,color=species))
p
## Warning: Removed 2 rows containing missing values (`geom_point()`).

p + scale_color_manual(name="New",
    values=c("blue","green","red")) +
scale_shape_manual(name="Bla",
                   values=c(0,1,2))
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Notes: Shape scale can be adjusted using scale_shape_manual(). Multiple mappings for the same variable groups legends.

Scales • Axes

p <- ggplot(penguins)+geom_point(
  aes(x=flipper_length_mm,y=body_mass_g))
p
## Warning: Removed 2 rows containing missing values (`geom_point()`).

p + scale_x_continuous(name="flipper_length",
        breaks=seq(172,231),limits=c(220,231))
## Warning: Removed 301 rows containing missing values (`geom_point()`).

Notes: The x and y axes are also controlled by scales. The axis break points, the break point text and limits are controlled through scales.

When setting limits using scale_, the data outside the limits are dropped. Limits can also be set using lims(x=c(3,5)) or xlim(c(3,5)). When mapping, coord_map() or coord_cartesian() is recommended for setting limits.

Theme • Text

element_text(family=NULL,face=NULL,color=NULL,size=NULL,hjust=NULL,
             vjust=NULL, angle=NULL,lineheight=NULL,margin = NULL)
dfr <- data.frame(value=rep(1,7),label=c("axis.title","axis.text","plot.title","plot.subtitle","legend.text","legend.title","strip.text"),stringsAsFactors=FALSE) %>%
  mutate(label=factor(label,levels=c("axis.title","axis.text","plot.title","plot.subtitle","legend.text","legend.title","strip.text")))

q <- ggplot(dfr,aes(x=label,y=value,fill=label))+
  geom_bar(stat="identity")+
  labs(x="",y="")+
  coord_flip()+
  scale_fill_manual(values=c("#e41a1c","#377eb8","#4daf4a","#984ea3","#ff7f00","#ffff33","#a65628"))+
  theme_minimal(base_size=20)+
  theme(
    legend.position="none",
    axis.text.x=element_blank(),
    axis.ticks=element_blank(),
    panel.grid=element_blank())

wrap_plots(p,q,nrow=1,widths=c(3,1))
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Theme • Rect

element_rect(fill=NULL,color=NULL,size=NULL,linetype=NULL)
# p <- diamonds %>%
#       filter(cut=="Fair"|cut=="Good",color=="D"|color=="E") %>%
#       droplevels() %>%
#       ggplot(aes(carat,price,alpha=color,shape=cut))+
#             geom_point()+
#             labs(title="Title",subtitle="subtitle")+
#             facet_grid(cut~color)
p <- penguins %>%
      ggplot(aes(flipper_length_mm,body_mass_g,alpha=bill_length_mm,shape=island))+
            geom_point()+
            labs(title="Title",subtitle="subtitle")+
            facet_grid(island~year)
#|eecho: true
p <- p + theme(
    plot.background=element_rect(fill="#b3e2cd"),
    panel.background=element_rect(fill="#fdcdac"),
    panel.border=element_rect(fill=NA,color="#cbd5e8",size=3),
    legend.background=element_rect(fill="#f4cae4"),
    legend.box.background=element_rect(fill="#e6f5c9"),
    strip.background=element_rect(fill="#fff2ae")
)
## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
dfr <- data.frame(value=rep(1,6),label=c("plot.background","panel.background","panel.border","legend.background","legend.box.background","strip.background"),stringsAsFactors=FALSE) %>%
  mutate(label=factor(label,levels=c("plot.background","panel.background","panel.border","legend.background","legend.box.background","strip.background")))

q <- ggplot(dfr,aes(x=label,y=value,fill=label))+
  geom_bar(stat="identity")+
  labs(x="",y="")+
  coord_flip()+
  scale_fill_manual(values=c("#b3e2cd","#fdcdac","#cbd5e8","#f4cae4","#e6f5c9","#fff2ae"))+
  theme_minimal(base_size=20)+
  theme(
    legend.position="none",
    axis.text.x=element_blank(),
    axis.ticks=element_blank(),
    panel.grid=element_blank())

wrap_plots(p,q,nrow=1,widths=c(3,1))
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Theme • Reuse

p <- penguins %>%
      droplevels() %>%
      ggplot(aes(flipper_length_mm,body_mass_g,color=island))+
            geom_point()
newtheme <- theme_bw() + theme(
  axis.ticks=element_blank(), panel.background=element_rect(fill="white"),
  panel.grid.minor=element_blank(), panel.grid.major.x=element_blank(),
  panel.grid.major.y=element_line(size=0.3,color="grey90"), panel.border=element_blank(),
  legend.position="top", legend.justification="right"
)
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
p
## Warning: Removed 2 rows containing missing values (`geom_point()`).

p + newtheme
## Warning: Removed 2 rows containing missing values (`geom_point()`).

Interactive

p <- ggplot(penguins,aes(x=flipper_length_mm,y=body_mass_g,col=species))
p1 <- p+geom_point()
plotly::ggplotly(p1,
  width=500,height=400)
p2 <- p+
  ggiraph::geom_point_interactive(
  aes(tooltip=paste0("<b>species: 
  </b>",species)))+
  theme_bw(base_size=12)
ggiraph::ggiraph(code=print(p2))
## Function `ggiraph()` is replaced by `girafe()` and will be removed soon.
## Warning: Removed 2 rows containing missing values (`geom_interactive_point()`).

Notes: Most interactive plotting libraries are not as complete as ggplot2. Therefore, some packages explore ways of converting ggplot2 objects into interactive graphics

Professional themes

How BBC works with R graphics

——– END ———